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Abstract 

The modeling and analysis of networks and network data has seen an 
explosion of interest in recent years and represents an exciting direction for 
potential growth in statistics. Despite the already substantial amount of work 
done in this area to date by researchers from various disciplines, however, 
there remain many questions of a decidedly foundational nature — natu- 
ral analogues of standard questions already posed and addressed in more 
classical areas of statistics — that have yet to even be posed, much less ad- 
dressed. Here we raise and consider one such question in connection with 
network modeling. Specifically, we ask, "Given an observed network, what 
is the sample size?" Using simple, illustrative examples from the class of 
exponential random graph models, we show that the answer to this question 
can very much depend on basic properties of the networks expected under 
the model, as the number of vertices Ny in the network grows. In particu- 
lar, we show that whether the networks are sparse or not under our model 
(i.e., having relatively few or many edges between vertices, respectively) is 
sufficient to change the asymptotic rates for maximum Ukelihood parameter 

1/2 

estimation by an order of magnitude, from Ny' to Ny. We then explore 
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some practical implications of this result, using both simulation and data on 
food-sharing from Lamalera, Indonesia. 

Keywords: Asymptotic normality; Consistency; Exponential random graph 
model; Maximum likelihood. 



1 Introduction 

Since roughly the mid-1990s, the study of networks has increased dramatically. 
Researchers from across the sciences — including biology, bioinformatics, com- 
puter science, economics, engineering, mathematics, physics, sociology, and statis- 
tics — are more and more involved with the collection and statistical analysis of 
data associated with networks. As a result, statistical methods and models are 
being developed in this area at a furious pace, with contributions coming from 
a wide spectrum of disciplines. See, for example, |Jackson| ( |2008| ), [Kolaczyk 



( |2009[ ), and |Newman| ( |2010[ ) for recent overviews from the perspective of eco- 
nomics, statistics, and statistical physics, respectively. 

A network is typically represented mathematically by a graph, say, G = 
(V, E), where V is a set of vertices (commonly written V = {1, . . . , N^}) 
and -E is a set of Ne edges (represented as vertex pairs {u, v) E E). Edges can 
be either directed (wherein {u,v) is distinct from {v,u)) or undirected. Promi- 
nent examples of networks represented in this fashion include the World Wide 
Web graph (with vertices representing web-pages and directed edges representing 
hyper-links pointing from one page to another), protein-protein interaction net- 
works in biology (with vertices representing proteins and undirected edges repre- 
senting an affinity for two proteins to bind physically), and friendship networks 
(with vertices representing people and edges representing friendship nominations 
in a social survey). 

A great deal of attention in the literature has been focused on the natural prob- 
lem of modeling networks. There are by now a wide variety of network models 
that have been proposed, ranging from models of largely mathematical interest to 
models designed to be fit statistically to data. See, for example, the sources cited 



above or, for a shorter treatment, the review paper by [Airoldi et al. (2009). The 



derivation and study of network models is a unique endeavor, due to a number 
of factors. First, the defining aspect of networks is their relational nature, and 
hence the task is effectively one of modeling complex dependencies among the 
vertices. Second, quite often there is no convenient space associated with the net- 
work, and so the type of distance and geometry that can be exploited in modeling 
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other dependent phenomena, like time series and spatial processes, generally are 

not available when modeling networks. Finally, network problems frequently are 
quite large, involving hundreds if not thousands or hundreds of thousands of ver- 
tices and their edges. Since a network of A^^ vertices can in principle have on the 
order of O (iV^ ) edges, in network modeling and analysis — particularly statistical 
analysis of network data — the sheer magnitude of the network can be a critical 
factor in this area. 

Suppose that we observe a network, in the form of a directed graph G = 
{V, E), where y is a set of A^^ = \V\ vertices and is a set of ordered vertex 
pairs, indicating edges. Alternatively, we may think of G in terms of its x A^^ 
adjacency matrix Y, where — 1, if e E, and 0, otherwise. What is our 
sample size in this setting? At the August, 2010 opening workshop of the recent 
Program on Complex Networks, held at the Statistical and Applied Mathematical 
Sciences Institute (SAMSI), in North Carolina, USA, this question in fact evoked 
three different responses: 

(1) it is the number of unique entries in Y, i.e., N^{Nv — 1); 

(2) it is the number of vertices, i.e., Ny, or 

(3) it is the number of networks, i.e., one. 

Which answer is correct? 

Despite the already vast literature on network modeling, to the best of our 
knowledge this question has yet to be formally posed much less answered. That 
this should be so is particularly curious given that the analogous questions have 
been asked and answered in other areas involving dependent data, most notably 
in time series analysis and the analysis of spatial data. Specifically, in both of the 
latter contexts, it is often possible to show that, whereas nominally the asymp- 
totic variance of maximum likelihood estimates for parameters scales inversely 
with the sample size n, under dependency a different scaling obtains, reflecting a 
combination of (a) the nominal sample size n, and (b) the dependency structure 
in the data. Thus, it seems not unreasonable to hope that similar results might be 
produced in the context of networks, with the asymptotics shown to be a function 
of the number vertices N^, modified by characteristics of the network structure 
itself. 

Following similar practice in these other fields, therefore, we will interpret the 
scaling of the asymptotic variances of maximum likelihood estimates in a network 
model as an effective sample size. In this paper we provide some initial insight 
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into the question of what is the effective sample size in network modeling, focus- 
ing on the impact of what is arguably the most fundamental of network charac- 
teristics — sparsity. A now commonly acknowledged characteristic of real-world 
networks is that the actual number of edges tends to scale much more like the 
number of vertices (i.e., 0{Ny)) than the nominal number of edges (i.e., 0{N^)). 
Here we demonstrate that two very different regimes of asymptotics, correspond- 
ing to responses 1 and 2 above, obtain for maximum likelihood estimates in the 
context of a simple case of the popular exponential random graph models, under 
non- sparse and sparse variants of the models. 

Notably, the specification of our models and the derivations of our results all 
utilize concepts and tools accessible to a first-year graduate student in statistics. 
Accordingly, our results serve to highlight in a straightforward and illustrative 
manner how the question of effective sample size in network settings can in fact 
be expected to be non-trivial and that the answer in general is likely to be subtle, 
depending substantially on basic model assumptions. 

The rest of this paper is organized as follows. Some background and defi- 
nitions are provided in Section |2j Our main results are presented in Section [3| 
first for the case where edges arise as independent coin flips and, second, for the 
case in which flips corresponding to edges to and from a given pair of vertices are 
dependent. We then illustrate some practical implications of our results, through 
a simulation study in Section |4[ exploring coverage of confidence intervals as- 
sociated with our asymptotic arguments, and through application to food- sharing 
networks in Section [S} where we examine the extent to which real- world data can 
be found to support non-sparse versus sparse variants of our models. Finally, some 
additional discussion may be found in Section [6} 



2 Background 



There are many models for networks. See [Kolaczyk ( 2009| ), Chapter 6, or the 
review paper by Airoldi et al. (2009). The class of exponential random graph 
models has a history going back roughly 30 years and is particularly popular with 
practitioners in social network analysis. This class of models specifies that the 
distribution of the adjacency matrix Y follow an exponential family form, i.e., 
Peiy = y) oc exp {O^ g{y)), for vectors 6 of parameters and g{-) of sufficient 
statistics. However, despite this seemingly appealing feature, work in the last 
five years has shown that exponential random graph models must be handled with 
some care, as both their theoretical properties and computational tractability can 
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be rather sensitive to model specification. See [Robins et al.|([2007[), for example, 



and Chatterjee and Diaconis (201 1 1, for a more theoretical treatment. 

Here we concern ourselves only with certain examples of the simplest type of 
exponential random graph models, wherein the dyads (Yij,Yji) and {Y^/, Ye^k) are 
assumed independent, for ^ {k,i), and identically distributed. These in- 
dependent dyad models arguably have the smallest amount of dependency to still 



be interesting as network models. A variant of the models introduced by Holland 
and Leinhardt ( 1981| ), they are in fact too simple to be appropriate for modeling 
in most situations of practical interest. However, they are ideal for our purposes, 
as they allow us to quickly obtain non-trivial insight into the question of effective 
sampling size in network modeling, using relatively standard tools and arguments. 
The models we consider are all variations of the form 



PaAy = y) = U 

i<j 



exp {a{yij + yji) + f3y^jyji} 



1 + 2e" + e2"+/3 
1 



1 + 2e° + e2"+/5 



X exp {as{y) + (3m{y)} , (1) 



with sufficient statistics 



(2) 



i<3 



a so-called Bernoulli model with reciprocity. The parameter a governs the propen- 
sity of pairs of vertices i and j to form an edge (z, j), and the parameter (3 governs 
the tendency towards reciprocity, forming an edge (j, z) that reciprocates («, j). 
Of interest will be both this general model and the restricted model = Pa,o-, 
wherein /3 = and there is no reciprocity. We will refer to this latter model 
simply as the Bernoulli model. Realizations of networks from this model with- 
out and with reciprocity (holding expected edge count s{y) fixed) are given in 
Figure l|a) and |(b)[ respectively. 

Importantly, in both the Bernoulli model and the Bernoulli model with reci- 
procity, we will examine the question of effective sample size under both the orig- 
inal model parameterization and a reparameterisation in which parameter(s) are 



shifted by a value log A^^,. Krivitsky et al. (2011 1 introduced such shifts as a way 
of adjusting models like ([T|) for network size such that realizations with fixed a 
and (3 would produce network distributions with asymptotically constant expected 
mean degree, Ea^siY) /N^], for varying N^. That is, a configuration (a, (3) that 
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(a) 7V„ = 100, s{y) w 100 



(b) = 100, s{y) w 100, m(y) w 25 





(c) = 200, preserve density of (a) (d) Ny = 200, preserve mean degree of (a) 



Figure 1: Sampled networks drawn from four configurations of ([T]). (a) shows 
a realization from a model with expected mean degree 1 on 100 vertices, and no 



reciprocity effect, (b) shows a realization from model with the same network size 



and mean degree as (a) , but with reciprocity parameter (3 set such that the expected 



number of mutual ties is 25. (c) is a realization of the model from (a) scaled to 



200 vertices, preserving density; while (d) preserves mean degree 
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would produce a typical A^^, = 100 realization like that in FigurefTa] would produce 
an N^, = 200 realization like that in Figure Id The model's baseline asymptotic 
behavior is to have a constant expected density, Ea,i3[s(Y) / {Ny(Ny — 1)}], such 
that a parameter configuration that would produce a network like la for Ny = 100 
would produce a network like Ic for = 200. Motivated by similar concerns, we 
use the presence or absence of such shifts to produce two different types of asymp- 
totic behavior in our network model classes, corresponding to sparse (asymptot- 
ically finite mean degree) and non-sparse (asymptotically infinite mean degree) 
networks, respectively. Because it is widely recognized that most large real-world 
networks are sparse networks, this distinction is critical, and, as we show below, 
it has fundamental implications on effective sample size. 



3 Main Results 
3.1 Bernoulli Model 

We first present our results for the Bernoulli model. Let Pa denote the model 
Pa,o^ as defined above, and let pj^ denote the same model, but under the mapping 
a H-). a — log Ny of the density parameter. Then, it is easy to show that under pa 
the mean vertex in- and out-degree tends to infinity and the network density stays 
at logit^^(Q;) as A^^, — )■ oo, while under pl^, the mean degree tend to e" while the 
density tends to zero. In fact, the limiting in- and out-degree distributions tend to 
a Poisson law with the stated mean. 

From the perspective of traditional random graph theory, the offset model 
of |Krivitsky et al.| ([2011') is asymptotically equivalent to the standard formula- 



tion of an Erdos-Renyi random graph, in which the probability of an edge scales 
like e°'/Ny. Alternatively, from the perspective of social network theory, it is use- 
ful to examine the log-odds that = 1, conditional on the status Vj-ij] = y[-ij] 
of all other edges log-odds of an edge, i.e.. 



log 



This quantity goes from being a constant value a under p = pa to a value a — 
log A'^, under pjj. This reflects the intuition that as long as there is a cost associated 
with forming and maintaining a network tie, an individual will be able to main- 
tain ties with a shrinking fraction of the network as the network grows, with the 







] = y[~ 




P {Y.J = 




] = y[- 
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average number of maintained ties being unaffected by the growth of the network 
beyond a certain point ( Krivitsky et al.[[2011| ). 



Given the observation of a network Y randomly generated with respect to 
either of these models, initial insight into the effective sample size can be obtained 
by studying the asymptotic behavior of the Fisher information, which we denote 
X(a) and I'^ia) under pa and p]^, respectively. Straightforward calculation shows 
that while 



X(a) 



in contrast, 



So X(a) = 0{Nl), while X(a)1^ = 0{N.J), a difference by an order of magnitude. 

The implications of this difference are immediately apparent when we con- 
sider the asymptotic behavior of the maximum likelihood estimates of a under 
the two models. 

Theorem 1. Let a and denote the maximum likelihood estimates of the param- 
eter ao under the models pa^ and p"^^^, respectively, where ao G [amin, (y-max], for 

finite amin, o^max- Then under the model Pa^, the estimator a is (^"') -consistent 
for ao, and 

]\f\ ^ / C 2e°° 

'a - ao) ^ N \ 0, 



2 / ' ' \ ' 1 (1 + e"o)2 

while under the model p]^^^, the estimator a^ is -consistent for ao, and 

\fW^{a^ -a^ ^ iV(0, e""«). 



The proof of these results uses largely standard techniques for asymptotics of 
estimating equations, but with a few interesting twists. Note that, for fixed A^^, the 
dyads (Fjj, Yji) constitute Ny^N^ — l)/2 independent and identically distributed 
bivariate random variables under both pa and pl^. Consistency of the estimators in 
both cases can be argued by verifying, for example, the conditions of Theorem 5.9 
of |Van der Vaart| ( [2000[ ) for consistency of estimating equations. Similarly, the 
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proof of asymptotic normality of the estimators can be based on the usual tech- 
nique of a Taylor series expansion of the log-likelihood and, due to the fact that 
we have assumed an exponential family distribution, the asymptotic normality of 
the sufficient statistic s{y) in (|2]). However, in the case of the sparse model pj^, 
the dyads {{Yij, Yji)}i^j follow a different distribution for each Ny, and therefore 
an array -based central limit theorem is required to show the asymptotic normality 
of s{y). But since increasing the number of vertices from, say, — 1 to A^^, as 
Ny — )■ oo, increases the number of dyads in our model by iVy — 1, a standard tri- 
angular array central limit theorem is not appropriate here. Rather, a double array 
central limit theorem is needed, such as Theorem 7.1.2 of |Chung](|2001[). A full 



derivation is provided in Appendix A. 2 



3.2 Bernoulli Model with Reciprocity 

From Theorem [T] we see that the effective sample size in this context can be ei- 
ther of A^^, depending on the scaling of the assumed model, i.e., on whether 
the model is sparse or not. From a non-network perspective, these results can 
be largely anticipated by the rescaling involved, in that the transformation a 
a — log Ny induces a rescaling of the expected number of edges by N^^. Now, 
however, consider the full Bernoulli model with reciprocity, Pa,i3, defined in ([!]). 
Even with just two parameters the situation becomes notably more subtle. 

Let I{a, 13) be the 2x2 Fisher information matrix under this model. Then cal- 
culations analogous to those required for our previous results show that X(q;, (3) = 
0{N^) and, similarly, asymptotic properties of the maximum likelihood estimate 
of (a, /3) analogous to those for pa hold. 

Let us focus then on sparse versions of pa,/3- The offset used previously, i.e., 
mapping a to a — log Ny, is not by itself satisfactory. Call the resulting model 
pjj, ^. Standard arguments show that the limiting in- and out-degree distributions 
under this model will be Poisson with mean parameter e". On the other hand, the 
expected number of reciprocated out-ties a vertex has, E^l ^[2m(Y) /Ny], behaves 
like 6^°'^^ /Ny, and therefore tends to zero as Ny — )• oo. Thus, (3 plays no role 
in the limiting behavior of the model, and, indeed, reciprocity vanishes. This fact 
can also be understood through examination of the Fisher information matrix, say 
X^(a, (3), in that direct calculation shows 



Xt(a,/3) 



0{Ny) 0(1) 
0(1) 0(1) 
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That is, only the information on a grows with the network. Under p)^ ^, only the 
affinity parameter a can be inferred in a reliable manner. 

However, the same intuition that suggests that as the network becomes larger, 
a given actor % will have an opportunity for contact with a smaller and smaller 
fraction of it also suggests that if there is a preexisting relationship in the form 
of a tie from j to i, such an opportunity likely exists regardless of how large the 
network may be. This, as well as direct examination of the exact expression for 
the information matrix suggests that the — logA^^, penalty on tie log- 

probability should not apply to reciprocating ties, which may be implemented by 
mapping /3 h-> /3 + log A^^,. Call this model, in which pj^ ^ is augmented with this 

additional offset for /3, the model ^. The corresponding conditional log-odds 
of an edge now have the form 

(^u = 1 1 = yi-ij]) ^ fa - log A^^, if Vji = 0, 

which exactly captures the intuition described. 

It can be sho wn that under ^ we have X-'-( a, /3) = 0(A^t,), indicating that in- 
formation on both parameters grows at the same rate in N^. It can also be shown 
that the limiting in- and out-degree distribution is now Poisson with mean param- 
eter e° + 6^°+^, and that El^p[2m{Y) / N^] tends to 6^"+'^. So, both parameters 
play a role in the limiting behavior of the model and the additional offset induces 
an asymptotically constant expected per-vertex reciprocity in addition to asymp- 
totically constant expected mean degree. 

Finally, we have the following analogue of Theorem [T] 



Theorem 2. Let {a^, (3^ ) denote the maximum likelihood estimate of the parame- 
ter (ao, /3o) under the model pl^ j^^, where (ao, /3o) G ["mm, otmax] x P^ax], 

for finite amin,Oimax, Pmin, Pmax- Then is N^^ -consistent for {ao,Po), 

and 



'N„ ( "! "° 1 ^ AT ( , e-"° 



1 -2 

-2 4 + 2e-"o-'^« 



Proof of this theorem, using arguments directly analogous to those of Theo- 
rem [T| may be found in the Appendix A. 3 From the theorem we see that under 



the sparse model ^, as under pj^, the effective sample size is A'^, 
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4 Coverage of Wald Confidence Intervals 



Our asymptotic arguments in Section [3] were developed primarily for the purpose 
of establishing the scaling associated with the asymptotic variance, so as to pro- 
vide insight into the question of effective sample size — our main focus here. 
However, the asymptotically normal distributions we have derived are of no lit- 
tle independent interest themselves, as they serve as as a foundation for doing 
formal inference on the model parameters in practice. By way of illustration, 
here we explore their use for constructing confidence intervals, particularly those 
based on Theorem 2 under a model ^, the Wald confidence intervals using 

plug-in estimators for the standard errors are ± -2('i_cl)/2 V^^"^ I-^v f*^^ ^ ^'^d 
± 4_cL)/2\/e-"*(4 + 2e-^'-P')/N, for /3. 

Because our asymptotics are in A^^, = \V\, we examine a variety of network 
sizes. The desired asymptotic properties of the network are expressed in terms 
of the per-capita mean value parameters — /Ny] and Ea^pl^mlY) /N^]. 

We study two configurations: 

(1) {E^My)/N.lE^^p[m{Y)/N,]) = (1,0.25) and 

(2) {E^^p[s{Y)lN,lE^^p[m{Y)lN,]) = (1,0.40). 

In other words, the expected mean outdegree is set to 1, and expected numbers of 
out-ties that are reciprocated are 0.25 x 2 = 0.5 and 0.40 x 2 = 0.8 per actor, 
respectively. These represent two levels of mutuality, though note that even ([T]) 
reprsents substantial mutuality, especially for larger networks. 

For each = 10, 15, 20, ... , 200, we estimate the natural parameters of the 
model ^ corresponding to the desired mean value parameters, and then simulate 
100,000 networks from each configuration, evaluating the MLE and constructing 
a Wald confidence interval at each of level of the customary 80%, 90%, 95%, and 
99%, for a and for (3 (individually), checking the coverage. 

For some the smaller sample sizes, the simulated network statistics were not in 
the interior of their convex hull (0 < s{y) < Ny{Ny — l) andO < m{y) < s{y)/2), 
so the MLE did not exist. (For ([T]), the fraction was 8.2% for A^^, = 10 and none 
of the 100,000 realizations had no MLE for > 55. For the it was 14.2% 
for = 10 and none of the realizations had no MLE for > 65.) 

Our results are conditional on the MLE existing. From the frequentist perspec- 
tive, one might argue that if the MLE did not exist for a real dataset, we would not 
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Table 1 : Simulated Theorem [2| confidence interval coverage levels for selected 
network sizes and two levels of reciprocity: lower Q and higher ([2]). 



Coverage 

800% 900% 95M> 99:0% 







a 


/3 


a 


/3 


a 


/3 


a 


P 




10 


72.4% 


77.3% 


85.3% 


89.8% 


93.2% 


95.2% 


96.4% 


99.4% 


20 


74.5% 


77.3% 


86.0% 


89.4% 


92.9% 


94.9% 


98.3% 


99.5% 




50 


80.9% 


78.8% 


87.6% 


89.4% 


94.7% 


94.8% 


98.9% 


99.2% 




100 


77.4% 


79.6% 


90.0% 


90.0% 


94.6% 


94.9% 


98.9% 


99.1% 




200 


79.0% 


79.5% 


90.1% 


89.8% 


94.9% 


94.9% 


98.9% 


99.0% 




10 


84.0% 


84.2% 


86.6% 


89.8% 


93.6% 


94.3% 


96.3% 


98.2% 


20 


81.8% 


80.3% 


92.8% 


92.1% 


95.1% 


96.0% 


98.1% 


98.8% 




50 


75.3% 


79.5% 


91.7% 


89.4% 


95.6% 


95.1% 


98.8% 


99.0% 




100 


78.5% 


79.7% 


91.0% 


90.2% 


94.5% 


94.9% 


99.0% 


99.1% 




200 


82.2% 


79.9% 


90.5% 


89.9% 


95.3% 


95.1% 


99.2% 


99.1% 



have reported that type of confidence interval, so it should be excluded from the 
simulation as well. 

We report coverages for selected network sizes in Table [T] and provide a vi- 
sualization in Figure [2} Overall, the 80% coverage appears to be varied — and 
and not very conservative — while higher levels of confidence appear to be more 
consistently conservative, particularly for estimates of 13. Coverage for a appears 
to oscillate as a function of network size. This is particularly noticeable for the 
lower confidence levels and stronger mutuality (|2]). Tendency of a confidence in- 
terval for a binomial proportion to oscillate around the nominal level is a known 
phenomenon dBrown et al. 2001 2002 and others), though it is interesting to 
note that it appears to be more prominent for the density, rather than mutuality, 
parameter and that it appears to be stronger for stronger mutuality. 



5 Example: Food-Sharing Networks in Lamalera 

While the results of Section |3] are important in establishing how closely the ques- 
tion of effective sample size in network modeling is tied to the structural property 
of (non)sparseness expected of the networks modeled, there remains the important 
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80% o 



90% o 



95% o 



99% o 




Figure 2: Differences between simulated coverage and nominal coverage for the 
two configuration studied, as a function of network size Ny. Note that the differ- 
ences are differences in percentage points (simulated % — nominal %), not percent 
differences ( '''^"'''l^-;,""]"'"'''^" x 100%). 
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practical question of establishing in applications just which model (i.e., sparse or 
non-sparse) is most appropriate. While a full and detailed study of this question 
is beyond the scope of this work, we present here an initial exploration. 

Note that, in exploring this question, we face a problem similar to that pointed 
out by IKrivitsky et al.| ( |2011| ): it requires a collection of closed networks of a 
variety of sizes yet substantively similar social structure. Furthermore, our re- 
sults are limited to modeling density and reciprocity, so the networks should be 
well- approximated by this model. Here, we use data collected by [Nolin (2010), 
in which each of 317 households in Lamalera, Indonesia was asked to list the 
households to whom they have given and households from whom they have re- 
ceived food in the preceding season. Lamalera is split, administratively, into two 
villages, which are further subdivided into wards, and then into neighborhoods. 
Nolin| (2010) fit several ERGMs to the network, finding that distance between 



households had a significant effect on the propensity to share, as did kinship be- 
tween members of the households involved. Nolin also found a significant positive 
mutuality effect. 

In our study, we make use of the geographic effect by constructing a series 
of 24 overlapping subnetworks, consisting of Lamalera itself, its 2 constituent 
villages, 6 wards, and 15 neighborhoods, with network sizes ranging from 12 to 
317. We then fit the baseline model p^^p to each network. If p^^p is the most 
realistic asymptotic regime for these data, we would expect estimates a and (3 
to have no relationship to log for the corresponding network. If p\ ^ is the 

most realistic, we would expect no relationship between logN^ and (3, but an 
approximately linear relationship with a, with slope around —1. Lastly, if ^ is 
the most realistic, we would expect the slope of the relationship between log 
and a to be around —1 and between log A^^, and (3 to be around +1. 

The estimated coefficients and the slopes are given in Figure [3] The results 
are suggestive. The relationship between a and log A'^ is clearly negative, while 
the relationship between (3 and log A'^ is clearly positive, and the magnitudes of 
both slopes are closer to 1 than to (although both are far from equaling 1). 
Overlap between the subnetworks induces dependence among the coefficients, so 
it is not possible to formally test or estimate how significant this difference is. 
Nevertheless, the preponderance of evidence is that ^ is the best of the three 
considered. That is, a sparse model that does not enforce sparsity on reciprocating 
ties appears to be preferable here. 

A possible explanation for why the magnitudes of the slopes are substantially 
less than 1 is that both the argument of [Krivitsky et al.| ( |201 1| ) and our argument in 
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Neighborhood o Village o Ward o Whole o 




3 4 



log(Nj 

Figure 3: Maximum likelihood estimates from fitting pa,p to each subdivision of 
the Lamalera food-sharing network. Colors indicate subdivision type. The least- 
squares coefficients from regressing a and (3 on logA^i, are —0.72 and +0.60, 
respectively. 
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Section 1X2] rely on the assumption that the network is closed, or, at least, that the 
stable mean degree and per-capita reciprocity are for the ties within the network 
observed. However, while there is likely to be very little food sharing out of or 
into Lamalera, and relatively little between the two villages it comprises (7% of 
all food- sharing ties in the network are between villages), there is more sharing 
between the wards (28% are between wards), and even more between neighbor- 
hoods (44%). Thus, the closed-network assumption is violated. (The respective 
between- subdivision percentages for reciprocated ties are 6%, 22%, and 39%.) 
When each of the subdivisions of the network is considered in isolation, these ties 
are lost, so the smaller subdivisions appear, to the model, to have smaller mean 
degree and per-capita mutuality. (See Figure |4}) This, in turn, means that smaller 
subdivisions have a decreased d (increasing the slope for it in Fig.|3]) and, because 
mutual ties suffer less of this "attrition" than ties do overall, the (3, after adjusting 
for the decreased a, is increased for smaller networks, thus reducing the slope for 
/3 in Fig.js] It is not unlikely that this pattern will hold in any network with an un- 
observed spatial structure, whose subnetworks of interest are contiguous regions 
in this space. 



6 Discussion 

Unlike conventional data, network data typically do not have an unambiguous 
notion of sample size. The examples we have presented show that the effective 
sample size associated with a network depends strongly on the model assumed for 
how the network scales. In particular, in the case of reciprocity, whether or not the 
model for scaling takes into account the notion of preexisting relationship affects 
whether reciprocity is even meaningful for large networks. 

Our model is, intentionally, a very simple one. However, with reciprocity, it 
includes an important aspect that already allows us a glimpse beyond the more 
sophisticated treatments of, say, Chatterjee et al. ( 2011[ ) and Rinaldo et al. (201 1 1, 



for so-called beta models, where the dependency induced here by reciprocity is 
absent. In addition, the results for reciprocity suggest that the effective modeling 
of triadic (e.g., friend of a friend of a friend) effects — arguably the most natural 
type of dependency to add next to the current model — in a manner indexed to 
network size is likely to require a more complex treatment yet, which, in turn, 
may further complicate the notion of effective sample size. While it is likely that 
insight into how to proceed from here can be gleaned from experience in other 
parts of the literature for dependent data, such as for time series and spatial data. 
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Figure 4: Per-capita network statistics as a function of A^^. Colors indicate subdi- 
vision type. Note that the larger subdivisions have more within- subdivision ties. 
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it is unlikely that the tools developed there can be applied directly. Rather, new 
tools and techniques almost certainly are required here. 

We note that asymptotic theory supporting methods for the construction of 
confidence intervals for network parameters is only beginning to emerge. The 
most traction appears to have been gained in the context of stochastic block mod- 
els (e.g., [Bickel and Chen| ( |2009l ); |Choi et"aL] ( |20T0l ); [Celisse et al.| ( |20TT] ); [Rohe 



et al. (2011)), although progress is beginning to be had with exponential ran- 



dom graph models as well (e.g., |Chatterjee et aL] ( |2011| ); |Chatterjee and Diaconis 



( |2011[ ); |Rinaldo et al.| ( |2011[ )). Most of these works present consistency results 



for maximum likelihood and related estimators, with the exception of Bickel and 



Chen (2009), which also includes results on asymptotic normality of estimators. 



Our work contributes to this important but nascent area. 

The lack of an established understanding of the distributional properties of pa- 
rameter estimates in commonly used network models is particularly unfortunate 
given that a number of software packages now allow for the easy computation 
of such estimates. For example, packages for computing estimates of parame- 
ters in fairly general formulations of exponential random graph models routinely 
report both estimates and, ostensibly, standard errors, where the latter are based 
on standard arguments for exponential families. Unfortunately, practitioners do 
not always seem to be aware that the use of these standard errors for constructing 
normal-theory confidence intervals and tests is lacking in any formal justification. 
From that perspective, our work appears to be one of the first to begin laying 
the necessary theoretical foundation to justify practical confidence interval pro- 
cedures in exponential random graph models. See Haberman ( 1981 1 for another 
contribution in this direction, proposed as part of the discussion of the original 
paper 



Holland and Leinhardt ( 1981 1. 
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A Derivations of Key Results 



These supplementary materials contain the derivations of key results and expres- 
sions provided in the main paper. We begin by establishing preliminary notation 



and expressions in Section A.l In Sections A. 2 and A. 3 we offer proofs of The- 
orems [T] and |2} respectively. 



A.l Preliminaries 

Recall that the models we consider are all variations of the form ([T]) with sufficient 



max\t with Ol.Yaini 



a 



/3„ 



maxi Hmmt 



Statistics and (a, /3) G [a^in, a^ax] x [|3^^n, A 
and (3max all finite. In particular, all models are of exponential family form 
— either with or without an offset term(s), in the terminology of [McCuUagh 
and Nelder (1989). Hence, for all models the probability mass function can be 
written as pe{y) = exp (6'^(7(?/)), and the log-likelihood, as l{9) = 

0^9iy) '~ where k{9) is the normalization term and tp{9) = logK(9). Fur- 
thermore, the Fisher information matrix in each case is given by the formula 
I{9) = d^^{9)/d9d9'^. 
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Let i^^ (a) denote the log-likelihood under the Bernoulli model p^', let (a) 
denote the loglikelihood under the Bernoulli model pj^, with offset a i-> a — 
logA^^.; let ipf^{a, (3) denote the loglikelihood under the Bernoulli model with 
reciprocity Pa,i3', and let £^ denote the loglikelihood under the Bernoulli 

model with reciprocity p^^^, with offsets a ^ a — logiV^ and /3 i-^ /3 + logiV^. 
All expressions provided in the main paper for the orders of magnitude of the 
elements of the corresponding information matrices, under these various models, 
may be obtained directly by twice differentiating 



(^^^^ log (1 + 2e" + e'"+^) , 



appropriately parameterized. 

Thus, for example, it is straightforward to show that 



(3) 



1 + 



1 + 



while 



N,{N, - 1) 
0{K). 



Similarly, for the two-parameter models, defining A = and B = e^""'"'^, we 
find that 



(t) 



[l + 2A + By 



2A + AB + 2AB 2{B + AB) 
2{B + AB) 2AB + B 



and so lN^{a, /3) = 0{N^). On the other hand, substituting A/N^ and B/N.^, for 
A and B, respectively, which captures the effects of the two offsets in the ^ 
model, and simplifying, we find that (a, /3) behaves asymptotically like 



A + 2B B 
B B/2 



and hence is 0(A^^) 
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A.2 Proof of Theorem [I] 

Theorem [I] establishes consistency and asymptotic normality for a and the 
maximum likelihood estimates of ao under the models p^o ^iid "p^^ , respectively. 
Here we sketch the case of a\ as the more interesting of the two. The case of a 
follows using conventional arguments. 

There are various ways one might argue consistency of . One approach 
would be to use techniques for M -estimators, the main requirement for which 
is that the log-likelihood converge uniformly in probability to a function with a 
well-defined maximum at ao. However, note that 



^4(«) = (iv,-i) 



{a-\ogN,)s{y)-\og[l + 



where s{y) = s{y)/[Ny{N^, - 1)]. Since 



{N, - l)rs{y) ^ e 



in probability, as A^^, oo, and \og {1 + e" / N^) behaves like e'^/N^ for large N^, 
it follows that the log-likelihood £jy^ (a) /N^ behaves like (a — log Ny)e"° — e" for 
large A^^,, and hence tends to — oo for all a. As a result, while the maximization 
of the log- likelihood is well-defined for each finite N^, this method of proof is not 
amenable to demonstrating consistency here. 

Instead, therefore, we study the behavior of the derivative of the log-likelihood 
with respect to a, i.e.. 



— — £t (a) 
K, da ^ 



(iV. - 1) 



(4) 



since ^jY^(a) — )■ \1^ ^(q;) in probabi lity for each a G 6, where \I/^(q;) = e"° — e". 
By Theorem 5.9 of Van der Vaart (2000), since \E'^(a) has a unique zero at a = 
ao, in order for us to demonstrate consistency it remains to show that \E'Jy (a) 
converges uniformly to \I/^(q;) on 6, i.e., that 



sup 



a, 







in probability as A^^, — )■ oo. 
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We have assumed that G [omin, c^max], a compact and connected set, and 
we have observed pointwise convergence of \l/jY^ to ^1^. Uniform convergence 
therefore follows if we can demonstrate stochastic equicontinuity of 



(iV. - 1) 



But a sufficient condition for stochastic equicontinuity is that Q be Lipschitz. 



See Corollary 2.2 to Theorem 2.1 of Newey ( 1991 ), for example. That this last 
condition is true, however, can be argued easily enough, as Qjv^ is a continuous 
function in « on a compact, connected domain, which by the mean value theorem 
allows us to write 



\QnM-QnA(^')\<k 



a 



a 



where K = sup^. 



QN^{a) and Qn^. Hence Qn^ is Lipschitz and has been 
shown to be consistent. 

To establish asymptotic normality, we use a standard argument based on Tay- 
lor series expansions. Begin by writing 







"0. 



(a 



where « is a value between and ao- (Here again we employ the dot notation 
for differentiation with respect to a.) It follows that 



«o)2n.(«) 



(5) 



Consider the numerator in (jsj). Recalling the form of (ij), we see that \l'jy^ (ag) 
is simply proportional to an average of the A^^, (A^^, — 1) independent and identically 
distributed link variables yij (i.e., s{y)) centered by its mean. However, two points 
are worth noting. First, this mean is changing as a function of N^. Second, as A^^ 
is allowed to tend towards infinity, for an increase of A^„ to A^„ + 1 there are a total 
of A^^, more observations y^j that define the average s{y). Therefore, we employ 
a double array central limit theorem in arguing for the asymptotic normality of a 
suitably normalized version of 
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Specifically, write 



and 

1 e"° 
= Var(.-(r)) = (^^^,.0). : 

and consider the standardized variables 



iV,(iV,-l)t;^^'(«o)' 



It is easy to see that, for each fixed N^, the Ny{Nv — 1) random variables Xij 
are all bounded by a constant, say Mjy^, and that limiv„-s>oo = 0. (Indeed, 
^Tv^ = 0{Nv ).) Hence the condition of the corollary immediately following 



Theorem 7.1.2 of Chung ( 2001| ) holds, and therefore the sum of the Xj/s, which 
is the standardized mean 

s{Y) - /ijv,(Q;o) 

tends in distribution to a standard normal. Noting then that this sum behaves 
asymptotically as ^^^^^^1^^(00)6"°°/^, we conclude that the numerator in (jsj) 
tends to a zero-mean normal random variable with variance e"" . 

Now consider the denomenator in (|5]). Direct calculation yields that 

pOO /AT 

In addition, we know that — ao = op(l), while an exercise in calculus shows 
that ^Jy^(tt) = Op{l). As a result, the denomenator in (jsj) tends to — e"" in 
probability. 

Combining these results, we conclude that A^^^ (d^ — ao) tends in distribu- 
tion to a mean-zero normal random variable with variance e^"*', as was to be 
shown. 

A.3 Proof of Theorem H 

Theorem [2] establishes consistency and asymptotic normality for {a^,(3^), the 
maximum likelihood estimate of (ao, Po) under the model Our proof uses 

arguments analogous to those of Theorem[T] 
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We again use Theorem 5.9 of | Van der Vaart| ( [2"000[ ) to show consistency of our 
estimator. Analogous to (|4]), the vector of partial derivatives of the log-likelihood 
function (a, /3) /N^, with respect to a and (3, has the form 



a, 13) 



s{y) 

m{y) - 



„2a + ^( 



(6) 



where s{y) = s(?/)/(™2''*0 and m{y) = m{y)/{^^). Note that 



(iV. - 1) 



m{y) 



2 (e"o + e2°o+/3o) 

„2ao+/3o 



in probability as A^^ — )• oo. Therefore, \E'^ (a, /9) -> in probability 

pointwise, where 



1 
2 



2 (e"o + e2°o+/^o) - 2 (e° + e2"+^) 



g2ao+/3o _ g2a+/3 



It is straightforward to show that 'i!^{a, (3) = has the unique solution (a, /3) = 
(ao, f3o). So consistency follows from arguing that the convergence of \E'^^ to is 
uniform on the set [amm, ctmax] x [/3mm, /^max] • Following the same line of reason- 
ing used above in the proof of Theorem 1 , it suffices to show that Qn^ = ^Ar„ — 
is Lipschitz on this set. But this follows immediately from the facts that the gradi- 
ent of Q is continuous on this set and the set is compact and connected, followed 
by an appeal to the multivariate version of the mean-value theorem. As a result, 
consistency follows. 

To argue for asymptotic normality, we begin again with a Taylor series expan- 
sion, which allows us to write 



+ 



/3^-/3o 



/3*-/3o 
a, (3) 



/3^-/3o 



(7) 



We then argue that (a) on the left-hand side of this expression the term converges 
asymptotically to a multivariate normal distribution, while (b) on the right-hand 
side, the multiplier in the first-order term converges to a constant, and the second- 
order term is asymptotically negligible. 
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In order to demonstrate asymptotic normality on the left-hand side of (|7]), we 
employ the Cramer- Wold device. From we see that the behavior of is 
driven by that of the mean vector (s(y),rh{y))~^ . Letting Aq = e"" and Bq = 



,2ao+/3o 



write 



\f s(Y) 

/^7v„(ao,/3o) = ^ ( m(Y) 

= {N, + 2Ao + Bo)-' 



2 {Ao + Bo) 
Bo 



and 



VAr,(ao,/3o) 



Cov 



KY) 
Th{Y) 



N,{l + 2Ao/N, + Bo/N^J' 



1 -1 



Fixt 



2Ao + ABo + 2AoBo/N, 2Bo + 2AoBo/N, 
2Bo + 2AoBo/N, Bo + 2AoBo/N, 



{h, ^2)^ and consider the standardized variables 



X 



it) 



ti{Yij + Yji) + t2YijYji — iiN^{ao, Po) 
(t)v/tTV)v„(ao,1St 



Again using Theorem 7.1.2 of Chung (2001 1, we have that the sum oftheX,^*^'s, 
and hence 

tis{Y) + t2m{Y) - t^/ijv, (ao, Po) 



/tTVW.(«o,/3o)t 

tends in distribution to a standard normal. But this holds true for all t, and there- 



fore 



V, 



-1/2, 



ao,/3o 



-s{Y) 
m(Y) 



AtAf„(«o,/3o) 



(8) 



tends asymptotically to a bivariate normal with mean zero and covariance the 
identity. 

Some algebra shows that the statistic in ([8]) behaves asymptotically like 

iVy2y-i/2(ao,/3o)^k(ao,/3o), 
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where 



V{ao,(3o) 



Bq Bo/2 



(9) 



Therefore, A^y^\[^^ (ao, /^o) tends asymptotically to a bivariate normal distribu- 
tion, with zero mean and covariance V{ao, (3o). This establishes the behavior of 
the left-hand side of Q. 

Considering the right-hand side of (|7]), direct calculation shows that 



*k("o,/3o 



N,, - 1 fK 



VAf„(ao,/3o) -V{ao,l3o). 



In addition, we know that \ \{a^, (3'^)^ 
ward to show that = Op(l). 

Combining these results, we conclude that 



op(l) and it is straightfor- 



— a 



^ iV(0 , r-^(«o,/3o)). 



Noting that 

V-\ao,f3o) = e- 
completes the proof of the theorem. 



1 -2 

-2 4 + 2e-°"^ 
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